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Abstract 2 . 

o 

Given a finite nonempty sequence S of integers, write it as XY k , where Y k is a power of greatest 
(-h exponent that is a suffix of S: this k is the curling number of S. The Curling Number Conjecture is 

that if one starts with any initial sequence S, and extends it by repeatedly appending the curling 
number of the current sequence, the sequence will eventually reach 1. The conjecture remains 
open. In this paper we discuss the special case when S consists just of 2's and 3's. Even this case 
remains open, but we determine how far a sequence of n 2's and 3's can extend before reaching a 
1, conjecturally for n < 80. We investigate several related combinatorial problems, such as finding 
CN ' c(n, k), the number of binary sequences of length n and curling number k, and t(n,i), the number 

of sequences of length n which extend for i steps before reaching a 1. A number of interesting 
\Q ', combinatorial problems remain unsolved. 

^ ! 1 The Curling Number Conjecture 
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Given a finite nonempty sequence S of integers, write it as S = XY k , where X and Y are sequences 
of integers and Y k is a power of greatest exponent that is a suffix of S: this k is the curling number 
of S, denoted by cn(5). X may be the empty sequence e; there may be several choices for Y, 
although the shortest such Y which achieves k (which as we shall see in §3.1 is primitive) is unique. 

For example, if 5 = 012212212 2, we could write it as XY 2 , where X = 01221221 and 
Y = 2, or as XY 3 , where X = and Y = 12 2. The latter representation is to be preferred, since 
it has k = 3, and as k = 4 is impossible, the curling number of this S is 3. 
The following conjecture was stated by van der Bult et al. [2]: 

Conjecture 1. The Curling Number Conjecture. If one starts with any initial sequence of integers 
S, and extends it by repeatedly appending the curling number of the current sequence, the sequence 
will eventually reach 1. 



x To whom correspondence should be addressed. 

2 A preliminary report on the work in Section 2 was given in [3] 
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In other words, if So = S is any finite nonempty sequence of integers, and we define <S m +i to 
be the concatenation 

S m+ i := S m cn(S m ) for m > , (1) 

then the conjecture is that for some t > we will have cn(S't) = 1. 

For example, suppose we start with Sq = 2 3 2 3. By taking X = e, Y = 2 3, we have 5o = Y 2 , so 
cn(S ) = 2, and we get Si = 2 3 2 3 2. By taking X = 2, Y = 3 2 we get cn(Si) = 2, S 2 = 2 3 2 3 2 2. 
By taking X = 2 3 2 3, Y = 2 we get cn(5 2 ) = 2, S 3 = 2 3 2 3 2 2 2. Again taking A" = 2 3 2 3, Y = 2 
we get 01(63) = 3, S4 = 2323222 3. Now, unfortunately, it is impossible to write S4 = XY k with 
k > 1, so 01(64) = 1, S5 = 232322231, and we have reached a 1, as predicted by the conjecture. 
(If we continue the sequence from this point, it joins Gijswijt's sequence, discussed in §5.) 

Some of the proofs in van der Bult et al. [2] could be shortened and the results strengthened 
if the conjecture were known to be true. All the available evidence suggests that the conjecture is 
true, but it has so far resisted all attempts to prove it. 

In this paper we report on some extensive investigations into the case when the starting sequence 
consists of 2's and 3's (although even in this special case the conjecture remains open). 

In Section 2 we study how far a starting sequence of n 2's and 3's can extend before reaching 
a 1. Call the maximum such length Q(n). We determine Q(n) for all n < 48, and conjecturally for 
all n < 80 (Table 1 and Figure 1). The data suggests some properties that should be possessed by 
especially good starting sequences (Properties P2, P3, P4 in §2.2). Although we have not found any 
algebraic construction for good starting sequences, Section 2.3 describes a method which sometimes 
succeeds in building starting sequences of greater length. The algorithm which allowed us to extend 
the search to length 80 is discussed in §2.4. We would not be surprised if the conjecture in this 
special case turns out to be a consequence of known results on the unavoidability of patterns in 
long binary sequences — we discuss this briefly in §2.5. 

Section 3 is devoted to the combinatorial question: what is the number c(n, k) of binary se- 
quences of length n and curling number k? This seems to be a surprisingly difficult problem, and 
we have succeeded only in relating c(n, k) to two subsidiary quantities: p(n, k), the number of such 
sequences that are primitive, and p'(n,k), the number that are both primitive and robust (see 
§3.1). The main results of this section are the formulas for c(n, k) in Theorems 6 and 19. With 
their help we are able to enumerate the curling numbers of all binary sequences of length n < 104. 
The resulting table can be seen in entry A216955 3 in [8]. The number of binary sequences with 
curling number 1, c(n, 1) (A122536), is especially interesting and is discussed in §3.4. Some further 
recurrences given there enable us to compute c(n, 1) for n < 200 (although we still do not know an 
explicit formula). We make frequent use of the classical Fine-Wilf theorem, and it and two other 
preliminary results are given in §3.2. The differences d(n,k) := 2c(n — l,k) — c(n,k) show the 
structure of the c(n, k) table more clearly than the numbers c(n, A;) themselves, and are the subject 
of §3.6. 

In Section 4, we study the number t(n, i) of sequences of length n with tail length i, where 
< i < Q(n). By direct search we have determined t(n,i) for n < 48 (A217209), although without 
finding any recurrences (except for t(n, 0), which is the same as c(n, 1)). The terms in each row of 
the t(n, i) table occur in clumps, at least for n < 48. In §4.1 and §4.2 we investigate some statistics 
of the t(n, i) table, although we are a long way from finding a model which explains the clumps. 
Sections 4.3, 4.4, 4.5 discuss some combinatorial questions related to tail lengths. If the starting 
sequence Sq is sufficiently long, it seems plausible that prefixing So with a 2 or 3 is unlikely to 
decrease the tail length. If one of these prefixes decreases the tail length, we call So rotten, and if 

3 Throughout this article, six-digit numbers prefixed by A refer to entries in the OEIS [8]. 
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both prefixes 2 and 3 decrease the tail length we call it doubly rotten. Rotten sequences certainly 
exist, but up to length 30 there are no doubly rotten sequences, and we conjecture than none exist of 
any length (see Conjecture 3). If this conjecture were true, it would explain a certain phenomenon 
that we observed in §2.2, and it would also imply that Q(n + 1) > Q(n) for all n, something that 
we do not know at present. 

In Section 5 we briefly describe Gijswijt's sequence (A090822), which was the starting point for 
this investigation. The last section summarizes the open problems mentioned in the paper. 

Notation. Since the starting sequence S can be any sequence of integers, it seems appropriate in 
this paper to speak about "sequences" rather than "words" over some alphabet. However, we will 
make use of certain terminology (such as "prefix" , "suffix" ) from formal language theory (cf . [6] ) . 

Sequences will be denoted by upper case Latin letters. S means SS ■ ■ ■ S, where S is repeated 
k times. The length of S is denoted by e denotes the empty sequence. 

Sets of sequences will be denoted by script letters (e.g., C(n, k)) and their cardinalities by the 
corresponding lower case Latin letters (e.g., c(n, k)). Greek letters and other lower case Latin letters 
will also denote numbers. The symbol # denotes the cardinality of a set. 

The curling number of S is denoted by cn(5). For a starting sequence So := s± S2 ■ ■ ■ s n oi 
length n, where the Sj are arbitrary integers, we define S m +i to be the concatenation S m cn(S m ) = 
si ■ ■ ■ s n + m +i for m > 0. If cn(St) = 1 for some t > 0, then we call the smallest such t the tail 
length of So, denoted by t(Sq), and the corresponding sequence := S t = si ■ ■ ■ s n+ t is the 
extension of So- If no such t exists, then we set t{Sq) = oo, 5^ e ^ = Soo (and the Curling Number 
Conjecture would be false). 

2 Sequences of 2's and 3's 
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132 
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173 


173 
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173 











Table 1: Lower bounds on Q(n), the maximal tail length that can be achieved before a 1 appears, 
for any starting sequence Sq of n 2's and 3's. Entries for n < 48 are known to be exact; the other 
entries are conjectured to be exact. 
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2.1 Maximal tail length Q(n). 

One way to approach the conjecture is to consider the simplest nontrivial case, where the initial 
sequence Sq contains only 2's and 3's, and see how far such a sequence can extend using the rule 
(1) before reaching a 1. Perhaps if one were sufficiently clever, one could invent a starting sequence 
that would never reach 1, which would disprove the conjecture. Of course it cannot reach a number 
greater than 3, either, for the first time this happens the next term will be 1. So the sequence must 
remain bounded between 2 and 3. Unfortunately, even this apparently simple case has resisted our 
attempts to solve it. At the end of this section (see §2.5) we will mention some slight evidence that 
suggests the conjecture is true. First we report on our numerical experiments. 

Let f2(n) denote the maximal tail length that can be achieved before a 1 appears, for any 
starting sequence So consisting of n 2's and 3's. If a 1 is never reached, we set il(n) = oo. The 
Curling Number Conjecture would imply £l(n) < oo for all n. 

By direct search, we have found O(n) for all n < 48. (The values for n < 30 were given in [2].) 
The results are shown in Table 1 and Figure 1, together with lower bounds (which we conjecture 
are in fact equal to Q(n)) for 49 < n < 80. The values of f2(n) also form sequence A217208 in [8]. 
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Figure 1: Scatter-plot of lower bounds on f2(ra), the maximal tail length that can be achieved before 
a 1 appears, for any starting sequence So of n 2's and 3's. Entries for n < 48 are known to be 
exact; the other entries are conjectured to be exact. 

In [2], before we began computing J7(n), we did not know how fast it would grow — would it be a 
polynomial, exponential, or other function of n? Even now we still do not know, since we have only 
limited data. But up to n = 48, and probably up to n = 80, Q(n) is a piecewise constant function 
of n. There are occasional jump points, where Q(n) > f2(n — 1), but in between jump points Q(n) 
does not change. Of course this piecewise constant behavior is not incompatible with polynomial 
or exponential growth, if the jump points are close enough together, but up to n = 80 this seems 
not to be the case. There are long stretches where f2(n) is flat. A probabilistic argument will be 
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given in §4 which suggests (not very convincingly) that, on the average, Q(n) may be roughly c\ n, 
for a constant c\ ~ 1.34. Up to n = 49, f2(n) never decreases, although we cannot prove that this 
is always true (see §4.3). 

The jump points are at n = 1,2,4,6,8,9,10,11,14,19,22,48 and we believe the next three 
values are 68, 76 and 77 (A160766). 



2.2 Properties of good starting sequences. 

From n = 2 through 48 (and probably through n = 80) the starting sequences Sq which achieve 
Q(n) at the jump points are unique. These especially good starting sequences are listed in Tables 
2 and 3. For 2 < n < 48 (and probably for 2 < n < 80) these sequences So also have the following 
properties: 

(P2) S begins with 2. 

(P3) So does not contain the subword 3 3. 

(P4) So contains no nonempty subword of the form V 4 (and in particular does not contain 
2222). 
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22 
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2323 
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222322 
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23222323 
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223222323 


10 


2323222322 


11 


22323222322 


14 


22323222322323 


19 


2232232322232232232 


22 


2322322323222323223223 


48 


223223232223222322322232232322232223223222322323 



Table 2: Starting sequences of n 2's and 3's for which Q(n) > f2(n — 1), complete for 1 < n < 48. 



n Starting sequence 

68 223223222322323222322232232223223222322323222322 
23223222322322232232 

76 232223223222322323222322232232232223223222322323 
2223222322322322232232223223 

77 223222323222322232232223222323222322232232223222 
32322232223223232223223222323 



Table 3: Starting sequences of n 2's and 3's for which Q(n) > Q(n — 1), conjectured to be complete 
for 49 < n < 80. 

These are empirical observations. However, since they certainly hold for the first 2 49 — 1 choices 
for Sq, we venture to make the following conjecture: 
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Conjecture 2. If a starting sequence So of length n > 2 achieves Q(n) where Q(n) > Q(n — 1), 
then So is unique and has properties P2, P3 and P4- 

We can at least prove one result about these especially good starting sequences. Let Sq = 
s\S2 ■ • ' s n be any sequence of integers with extension S^ = St = s\ ■ ■ ■ s n+ t, where cn(St) = 1. 
Call So weak if each S r (r = 0, — 1) can be written as XY Sn+r+1 with X / e. In other 
words, So is weak if the initial term s\ is not necessary for the computation of the curling numbers 
s n+ \, . . . , s n+ t- This implies that t(So) = t(s2 • • • s n ), and establishes 

Lemma 1. If a starting sequence So of length n > 2 achieves f2(n) > Cl(n — 1), then So is not 
weak. 

One further empirical observation is worth recording, concerning the starting sequences between 
jump points. Suppose no, n\ are consecutive jump points, so that 

f2(n) = Q(n — 1) for no < n < n\ , 

and £l(n) > Q(n — 1) at n = no and n\. Then for n < 48 and conjecturally for n < 80, if 
no < n < ni, one can obtain a starting sequence that achieves £l(n) by taking the starting sequence 
of length no and prefixing it by a "neutral" string of n — no 2's and 3's that do not get used in 
the computation of Vt{n). Although this is not surprising, we are unable to prove that such neutral 
prefixes must always exist. We return to this topic in §4.3. 

The large gaps between the jump points at 22 and 48 and between 48 and 68 are especially 
noteworthy. In particular, we have 

n(n) = 120 for 22 < n < 47 , (2) 

and, conjecturally, 

Q(n) = 131 for 48 < n < 67 . (3) 

The data shown in Tables 1, 2, 3 and Figure 1 for n in the range 49 to 80 were obtained 
by computer search under the assumption that the starting sequence has the properties P3 and 
P4 mentioned above, although without making any assumption about uniqueness. As it turned 
out, assuming P3 and P4, the best starting sequences at the jump points are indeed unique and 
start with 2. Assuming P3 and P4 greatly reduces the number of starting sequences that must 
be considered. For example, simply excluding sequences that contain four consecutive 2's or four 
consecutive 3's reduces the number of candidates of length n from 2" to a constant times cV; , where 
C2 = 1.839- • • (cf. A135491). However, this by itself is not enough to enable us to reach n = 80. 
We discuss the algorithm that we used in more detail in §2.4. 

2.3 A construction for larger n. 

We have not succeeded in finding any algebraic constructions for good starting sequences. However, 
one simple construction enables us to obtain lower bounds on O(n) for some larger values of n. Let 
So be a sequence of length n that achieves O(n), and let S^ be its extension of length n + O(n). 
Then in some cases the starting sequence S^So will extend to S^S^2 and beyond before reaching 
a 1. For example, taking So to be the length 48 sequence in Table 2, the sequence S^So has length 
179 + 48 = 227 and extends to a total length of 596 before reaching a 1, showing that $7(227) > 269. 
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2.4 Computational details. 

Our results are complete for n < 48 and are probably complete through n = 80. In order to extend 
the search this far, the algorithms used were specifically tuned to the case of sequences of 2's and 
3's. There is no easy way (as far as we know) to avoid the basic process of computing the extension 
of S (compute cn(5), append it to S, and repeat until cn(S) = 1), and so the focus is on computing 
cn(5) quickly. In the following discussion we assume that S has length at least 36. The first step 
is brute force: look up the curling number cn(s n _35 • • • s n ) in a table. Two bits are sufficient to 
record cn, since we only care about whether it is 1, 2, 3 or > 4; at two bits per entry, this table 
occupies 16 gigabytes. This provides a lower bound on cn(5), and also gives a lower bound for the 
length of the repeated substring Y which maximizes cn(5). For example, if cn(s n _35 • • • s n ) = 1, 
then any Y which gives cn(S') > 1 must be at least 19 digits long, or there would have been two 
copies within the last 36 digits of S. 

There is also an upper bound on the length of Y. Since we are looking for that Y which 
maximizes cn(S), we are only interested in Y's which could be repeated more times than the 
current best known value of cn(S'). For example, if we know cn(5) > 3, then we only want a Y 
which is repeated four times, and so we only need consider lengths up to the length of S divided 
by 4. 

We now consider the last n digits of S as a candidate for Y, for all values of n between the 
lower and upper bounds. The sequences are represented as 128-bit binary numbers, and so looking 
for repetitions of Y can be done with bit manipulation. A few shifts and OR's generate 4 copies 
of Y, or as many as will fit in 128 bits. Then an XOR finds digits in which this differs from S, a 
bit scan locates the index of the first difference, and we can divide by the length of Y to find how 
many times this Y is repeated. (In fact, all divisions are done with precomputed tables.) If some Y 
increases the best known value for cn(S'), then the upper bound on the length of Y can be revised 
downwards. If we reach 128 digits and are still going, we resort to a slow string-based routine. In 
practice this slow routine accounts for less than 1% of the program's execution time. 

To compute the conjectured values up to length 80, we exclude (most) strings containing 3 3 
or a subword V 4 . Obviously we cannot check all 2 80 strings to see if they violate one of these 
conditions, so we need an efficient way to avoid considering them at all. To do this, we compute a 
256 x 256 table which lists, for every string of length 8, all the length-8 strings which could legally 
follow it. We then construct S recursively in 8-digit blocks, ensuring that the rules are not broken 
within any two consecutive blocks. This is not perfect (it will allow a V 4 to slip by if V is 9 digits 
long, for example), but it efficiently eliminates the vast majority of undesirable cases. 

2.5 Unavoidable regularities. 

One reason we think the Curling Number Conjecture may be true, at least in the special case of 
sequences of 2's and 3's, is that there are several theorems in formal language theory about the 
inevitability of regularities in long binary strings. A classical example is Shirshov's theorem [6, 
Theorem 7.1.4], [7, Theorem 2.4.3]. Unfortunately that does not quite do what we need, but it 
does offer hope that a proof along these lines may exist. Lyndon's theorem [6, p. 67] is another 
example. Suppose we have a very long sequence of 2's and 3's generated by (1), and consider 
its canonical decomposition into Lyndon words. There are relatively few Lyndon words that are 
possible (e.g., 2222 is forbidden), but since this attack has not yet led to a contradiction we shall 
say no more about it. 
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3 Number of binary sequences with given curling number. 



In this section we study the number c(n, k) of binary sequences of length n and curling number k. 
For consistency with the other sections, we continue to consider sequences of 2's and 3's, although 
for this question any alphabet of size 2 (such as {0,1}) would do equally well. 

3.1 Primitive and robust sequences. 

A sequence S is imprimitive (or periodic) if it is equal to T l for some sequence T and an integer 
i>2. Otherwise, S is primitive [6, p. 7]. 

Lemma 2. Suppose S has curling number k. Then S can be written as XY k , possibly in several 
ways. The shortest such Y is primitive and unique, and has curling number < k if k > 1, curling 
number 1 if k = 1 . 

Proof. If k > 1 and cn(Y) = k then Y would not be minimal. If Y were not primitive, say Y = T 1 , 
i > 2, then cn(5) > ki > k. Uniqueness is clear. □ 

We denote the length of this shortest Y by it. We let C(n,k,7r) denote the set of all S with 
the given values of n, k, and ir, c(n, k, it) := #C(n, k, tt), C(n, k) := U]p=i C( n i ^, vr), and c(n, k) := 

#C(n,k) = E l S i c(n,k,n). 

If S has curling number 1 then the shortest Y for which S = XY is simply the last term of S, 
so 7r = 1 and S E C(n, 1, 1). The sets C(n, 1, tt) for tt > 1 are empty. 

We let V{n, k) denote the subset of primitive S £ C(n, k), and p(n, k) := #V(n, k). Note that 
C(n, 1) = V(n, 1), since curling number 1 implies primitive. 

Also let Q(n,k) := IJi=i ^( n ' denote the set of primitive sequences with curling number at 
most k, and q(n, k) := #Q(n, k) = J2i=i P( n i We set g(n, 0) = and q(ra, fc) = g(n, n) for k > n. 
By definition, g(n, n) is the total number of aperiodic binary sequences of length n, and it is well 
known that 

q(n,n) = J> Q) 2 " ' (4) 

d\n 

where \i is the Mobius function (q(n,n) is sequence A027375). 

Call S £ V(n,k) robust if no proper suffix of S k+1 has curling number k + 1. Examples of 
non-robust sequences first appear at length 5, where 5 = 32232E C(5, 1) is not robust since 

S 2 = 3223232232 

has the suffix (2 3 2) 2 . At length 8 there are examples with k = 2, such as S = 3223223 2, 
for which S 3 has the suffix (2 3 2) 3 . Let V'(n,k) denote the subset of robust S £ V(n,k), and let 
p'(n,k) := #7" (n, AO- 
Tables 4, 5, 6, and 7 show the initial values of c(n, k), p(n, k), q(n, k), and p'(n, k), respectively. 
There are far fewer non-robust sequences than robust, and their numbers are shown in Table 8. 

3.2 Three preliminary theorems. 

The classical Fine- Wilf theorem ([4]; [1, p. 13], [5], [6, p. 10]) turns out to be very useful for studying 
curling numbers. 
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Table 4: Table of c(n,k), the number of binary sequences of length n and curling number k, for 
1 < /c < n and n < 12 (for an extended table see A216955). 
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Table 5: Table of p(n, k), the number of primitive binary sequences of length n and curling number 
k, for 1 < k < n and n < 12 (for an extended table see A218869). 
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n\k 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


1 


2 
























2 


2 


2 






















3 


4 


6 


6 




















4 


6 


10 


12 


12 


















5 


12 


24 


28 


30 


30 
















6 


20 


40 


48 


52 


54 


54 














7 


40 


92 


112 


120 


124 


126 


126 












8 


74 


174 


210 


226 


234 


238 


240 


240 










9 


148 


362 


438 


474 


490 


498 


502 


504 


504 








1 


286 


700 


860 


928 


960 


976 


984 


988 


990 


990 






11 


572 


1448 


1776 


1916 


1984 


2016 


2032 


2040 


2044 


2046 


2046 




12 


1124 


2846 


3486 


3762 


3894 


3958 


3990 


4006 


4014 


4018 


4020 


4020 



Table 6: Table of q(n, k), the number of primitive binary sequences of length n and curling number 
at most k, for 1 < k < n and n < 12 (for an extended table see A218870). 



n\k 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


1 


2 
























2 


2 

























3 


4 


2 























4 


6 


4 


2 





















5 


10 


12 


4 


2 



















6 


20 


20 


8 


4 


2 

















7 


36 


52 


20 


8 


4 


2 















8 


72 


98 


36 


16 


8 


4 


2 













9 


142 


214 


76 


36 


16 


8 


4 


2 











10 


280 


414 


160 


68 


32 


16 


8 


4 


2 









11 


560 


870 


326 


140 


68 


32 


16 


8 


4 


2 







12 


1114 


1720 


640 


276 


132 


64 


32 


16 


8 


4 


2 






Table 7: Table of p'(n, k), the number of robust primitive binary sequences of length n and curling 
number k, for 1 < k < n and n < 12 (for an extended table see A218875). 
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n\k 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


1 



























2 




























3 





























4 






























5 


2 




























6 
































7 


4 






























8 


2 


2 




























9 


6 
































10 


6 

































11 


12 


6 


2 




























12 


10 


2 

































Table 8: The numbers p(n, k) — p'(n, k) of non-robust primitive binary sequences of length n and 
curling number k, for 1 < k < n and n < 12 (for an extended table see A218876). 



Theorem 3. (Fine and Wilf.) If sequences S = X 1 and T = Y 3 have a common suffix U of length 

\U\ > \X\ + \Y\ - gcd(|X|,|Y|), (5) 

then, for some sequence Z and integers g, h, we have X = Z g , Y = Z h , \Z\ = gcd(|JT|, \Y\). 

There is an equivalent definition of robustness that is easier to check. 

Theorem 4. If S G V(n,k) is not robust, implying that S k+1 has a proper suffix T k+1 for some 
T, then T k+1 is in fact a proper suffix of S 2 . 

Proof. The assertion is trivially true if k = 1, so we assume k > 2. The hypotheses imply t := 
\T\ < n. Now S k+1 and T k+l have a common suffix of length (k + l)t. If it were the case that 
(k + l)t > n + 1 — 1, by Theorem 3 we would have S = Z 9 , T = Z h , for some Z,g, h with g > h, 
implying g > 2 and so S would be imprimitive, a contradiction. So (k + l)t < n + 1 — 1 < 2n, as 
required. □ 

It follows that S G V(n, k) is robust if and only if no proper suffix of S 2 has curling number 
k + 1. This greatly simplifies the computation of the numbers p(n, k). 

A trivial but useful observation is that prefixing a sequence with a single number cannot increase 
the curling number by more than 1: 

Theorem 5. If S G C(n, k) then 2S (and equally 35) is in either C(n + 1, k) or C(n + 1, k + 1). 

Proof If, for example, 2S G C(n + 1, 1) with I > k + 2, then 2S = UV l for some U,V,l, and V l ~ l 
(at least) is a suffix of S, contradicting the fact that S has curling number k. □ 

3.3 A recurrence for c(n,k). 

The first main theorem of this section expresses the n-th row of the c(n, k) table in terms of the 
(n — l)st row and much earlier rows of the p(n, k) and p'(n, k) tables. 
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Theorem 6. c(n, k) = for n < k — 1, c(n, k) = 2 for n = k, k + 1, and for n > k + 2 we /iawe 
c(n,k) = 2c(n—l,k) 

+ W „] (,/(»*-!) +,(-* _ 2 )) 

(6) 

where the Iverson bracket [R] is 1 i/ the relation R is true, otherwise. 

Proof. We assume k > 1 and n > k + 2. Suppose S 1 £ C(n, A;) and let T denote S with its left-most 
term deleted. We consider the cases cn(T) = k and cn(T) < k separately. 

In the first case, if T is any sequence in C(n — 1, k), and S is 2T or 3T, then, by Theorem 5, S 
is in either C(n, k) or C(n, k + 1). So we will obtain 2c(n — 1, A;) sequences in C(n, k), except that 
we must exclude from the count those T € C(n — 1, k) with the property that 2T or 3T = V k+1 for 
some primitive V of length n/(k + 1). This can only happen when n is a multiple of k + 1. These 
V's are primitive sequences of length n/(/c + 1), with curling number I < k, and are such that no 
proper suffix of V k+l has curling number greater than k. If I = k, the number of such V's is (by 
definition) p'(n/(k + 1), A;). On the other hand, if 1 < I < k — 1, any V S V(n/(k + 1),/) has the 
property that no proper suffix of V fc+1 has curling number greater than k (and the number of these 
is p(n/ (k + 1), I)). This follows from the Fine-Wilf theorem (Theorem 3). For if V k+l has a proper 
suffix of the form U k+1 , then these two sequences overlap in the last (k + l)u terms, where u = \U\, 
and also u < v, where v = |V| = n/(k + 1). Since V has curling number / < k, the right-most k 
copies of U are not a suffix of V, and so > v . This implies 

(k + l)u > v + u-1, (7) 

and so by Theorem 3, V = Z», U = Z h , h < g, g > 2. But V k = Z 2 9 is a suffix of T, so 
cn(T) > 2k > k, & contradiction. (Further applications of the Fine-Wilf theorem will follow this 
same pattern, and we will not give as much detail.) 

In the second case we must consider sequences S = V k where cn(T) < k. Now n must be a 
multiple of k, and V € V{n/k, I) for 1 < I < k — 1 is such that no proper suffix of V k has curling 
number k. If I = k — 1, the number of such V's is (by definition) p'(n/k, k — 1). On the other hand, 
if 1 < I < k — 2, the condition that no proper suffix of V k has curling number k follows from the 
Fine-Wilf theorem by an argument similar to that given above (except that k + 1 is replaced by k), 
and the number is p{n/k,l). This completes the proof of the theorem. □ 

3.4 Sequences with curling number 1. 

For the purpose of investigating the Curling Number Conjecture, we are particularly interested in 
the first three columns of the c(n, k) table, since they determine the probabilities that a random 
sequence of 2's and 3's has curling number 1, 2, 3, or > 4 (see §4.2). The values of c(n, 1) are 
especially intriguing, as this is a combinatorial problem of independent interest. The first 30 terms 
of c(n, 1) were contributed to [8] by Guy P. Srinivasan in 2006, who described it as the "number 
of binary sequences of length n with no initial repeats", which is equivalent to our definition 
(see A122536). However, even for c(n, 1), no explicit formula or recurrence is presently known 4 . 

4 The best we have is the asymptotic estimate (27). 
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Theorem 6 says only that 

c(n,l) = 2c(n-l,l) - [2|n] p'(n/2, 1), (8) 

p'{n/2, 1) being the number of robust primitive binary sequences of length n/2 and curling number 
1. 

Use of (8) enables n terms of the c(-, 1) sequence to be obtained from n/2 terms of the p'(-, 1) 
sequence. In practice, this limits us to about 100 terms of the former sequence. In order to obtain 
more terms, we introduce some further terminology (which will be used only in this section). 

If S has length n, let denote its length-i suffix, for 1 < i < n. Then we define 

A(n, i) := {S G C(n, 1) | cn(S® S) = l}, 1 < i < n , 

B(n, i) := {5 e C(n, 1) | cn(5 S) = l}, 1 < i < n , 

£(n,i,j) := {SeC(n, 1) | S [{] S £ B(n + i, j)}, 1 < i < n, 1 < j < n + i , 

and let a(n,i) = #A(n,i), b(n,i) = #B(n,i), e(n,i,j) = #£(n,i, j). S ^ T will mean that T is a 
suffix of S, and S >~ T that T is a proper suffix of 5. 

The following two theorems give a canonical form (see (9)) for non-robust sequences with curling 
number 1. 

Theorem 7. // cn(5) = 1 but cn(T5) > 1 for some T with S y T, then there exist X ^ e, Y ^ e 
with 

S = XYX, where cn(X) = 1, T h Y, and X y Y . (9) 

Proof. Since cn(T5) > 1, TS ^ ZZ for some Z with \Z\ < \S\, and therefore S >~ Z. We write 
S = XZ and observe that TXZ h ZZ, so TX y Z. Therefore either X y Z or Z y X. The 
former implies cn(5) > 1, a contradiction. So Z y X, say Z = YX, and S = XYX. 

Since S h X, cn(X) = 1. Also TX y Z = YX, so T h Y. It remains to show that X y Y. 
Now S y T y Y and S y X, so either Y y X or X y Y . The former implies Y = WX for some 
W, and then S = XYX = XWXX, contradicting cn(,S) = 1. So X y Y. □ 

Theorem 8. If S = XYX = UVU, with XyY^e, UyV^e, X^U, then cn(5) > 1. 

Proof. Without loss of generality, U y X. Since both X and U are prefixes of S, we have U = XZ 
for some Z ^ e, and S y XZ. Now 2\Z\ = \S\ - 2\X\ - \V\ < \S\ - 2\X\ = \Y\ < \X\, so \X\ > \Z\. 
This implies X y Z (they are both suffixes of S), say X = AZ, so S = UVU = UVXZ = UVAZZ, 
contradicting cn(5) = 1. □ 

Corollary 9. For each n, the sets B(n,i) (1 < i < n) are disjoint, and consequently the sets 
£(n, i,j) (1 < i < n, 1 < j < n + i) are disjoint. 

Corollary 10. (i) For 1 < i < n/3, there is a bijection between C{n, 1) \A{n,i) and 

L(n~l)/2J 

B(m, n — 2m) . 

m=\{n-i)/2\ 

(ii) For n/3 < i < n, there is a bijection between C{n, 1) \ A{n, i) and 

L(n-l)/2j 

B(m, n — 2m) . 

m= [n/3] 
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Corollary 11. (i) For 1 < i < n/3, 

L(n-l)/2j 

a(n,i) = c(n, 1) — b(m,n — 2m). (10) 

m=\{n-i)/2\ 

(ii) For n/3 < i < n, 

L(n-1)/2J 

a(n,i) = c(n, 1) — b(m,n — 2m). (11) 

m=["n/3] 

The next three theorems give a further refinement of non-robust sequences, and lead to the set 
bijections and formulas in Corollaries 15 and 16. We postpone their proofs to the Appendix. 

Theorem 12. If X y Y , cn(YX) = 1, and cn(XFJ) > 1, then there exist S and T such that 
YX = STS with X hT, SyT, cn(5) = 1. Furthermore, either \S\ = \ Y\ or \S\ > 2\Y\. 

Theorem 13. Ifn/2 <i < n then there is a bijection between A(n,i) \B(n,i) and B(i,n — i). 

Theorem 14. If 1 < i < n/3 then A(n, i) \ B(n, i) is a disjoint union of E(m — i, i, n + i — 2m), 
where max(2z, [~(n + i) /3] ) < m < \_(n + i — l)/2j . 

Corollary 15. (i) For 1 < i < n/5, there is a bijection between A(n,i) \ B(n,i) and the disjoint 
union of £(m — i,i,n + i — 2m), where 2i < m < \_(n + i — l)/2j . 

(ii) For n/5 < i < n/3, there is a bijection between A(n,i) \ B(n,i) and the disjoint union of 
£(m — i,i,n + i — 2m), where \(n + i)/3] < m < |_(n + i — l)/2j . 

(Hi) For n/3 < i < n/2, B(n,i) is empty. 

(iv) For n/2 < i < n, there is a bijection between A(n, i) \ Bin, i) and B(i, n — i). 
Corollary 16. (i) For 1 < % < n/3, 

L(n+i-l)/2j 

b(n,i) = a(n,i) — e[m — i,i,n + i — 2m) . (12) 

m=max(2i, [(n+i)/3] ) 

(ii) 

b(n,i) = for n/3 <i < n/2 . (13) 

(Hi) 

b(n,i) = a(n,i) — b(i,n — i) for n/2<i<n. (14) 

We have briefly investigated the possibility of generalizing the approach in this section to deal 
with curling numbers k greater than one. The following theorems replace Theorems 7 and 8: 

Theorem 17. Suppose S G V(n, k)\V'(n, k), where k > 1. Then there X and T with S = X(TX) k 
and S >- T. 

Proof By Theorem 4, S 2 = PQ k+1 with P/e. If (k + 1)|Q| > n + k - 1, then Theorem 3 would 
imply that S is periodic. So k\Q\ < n — 1, and k copies of Q lie in properly inside S, say S = XQ k+1 
with \X\ < \Q\, I^e. Define T by Q = TX. and we have S = X(TX) k . Also PQ k+1 = SXQ k , 
so PQ = PTX = SX, SyT. □ 

Theorem 18. The representation S = X(TX) k obtained in Theorem 17 is unique. 

We omit the proof. Since S G V(n,k), we know that S can be written as XY k , where Y is 
primitive, possibly in several ways. Theorems 17 and 18 say that if S is not robust, then exactly 
one of these Y's has the corresponding A as a suffix. We have not pursued the generalizations of 
Theorems 12-14 and Corollaries 15-16 to this case. 
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3.5 c(n, k) for k > L\A^J- 

The second main theorem of this section gives an expression for c(n, k) in the range k > [y/n\ that 
involves the partial sum function q(m, k). 



Theorem 19. c(n,n 
have 



= 2 for all n, c(n, n 
c(n,k,n) = 
and c(n, fc) = X^!r=i c ( n > ^> 7r )- 



1) = 2 /or n > 2, and for n > 4 and /c > [v^J 



2 n— (fc + l)7T 
2 n-fc7T ^ fc _ 



-l)q(n,k-l), i/l<^<Lm 
1), 



(15) 



Proof. We assume n > 4 and A; > Lv^J > 2. Note that > [v^-J i s equivalent to k + 1 > y'n. 
We consider the cases n < ir(k + 1) — 1 and n > (k + l)ir separately. 
First, if n < 7r(fc + 1) — 1 , we have 



"n+ r 




n 


< 7T < 




jfc + 1 







Let us write S 1 = XY k , where Y is minimal and has length tt. Then n < 7r(/c + 1) — 1 implies 
\X\ < tt. By Lemma 2, 1" G Q(vr, /c — 1). There are 2 n ~ nk choices for X, and q(ir,k — 1) choices 
for y, and we claim that the resulting sequence X Y k always has curling number k. For suppose 
it has curling number > k, so that we have XY k = UV k+1 , with u = \U\, v = \V\. There are two 
sub-cases. If (k + l)v > kit, then we have (k + 1)tt > IS") > (A; + l)v, implying 7r > v. The two 
different representations of S have a common suffix Y k of length /c7r, which, since k > 2, satisfies 



/C7T > U + TT — 1 . 



(16) 



By Theorem 3, Y = Z 9 , V = Z h , with g > h, so g > 2, and Y is imprimitive, a contradiction. On 
the other hand, suppose (A; + l)v < kix. Again tt > v. Since cn(Y) < k, kv > tt. Now the common 
suffix has length (k + l)v, our inequalities imply 



(k + l)v > V + TT- 1, 



(17) 



and, again by Theorem 3, Y is imprimitive, a contradiction. So the number of sequences S of this 
type is 2 n ~( k+1 * )n q(ir, k — 1), as claimed. 
Second, if n > (k + we have 



1 < tt < 



n 



k + l 



Let us write 



XBY K 



(18) 

where X has length n — (k + l)7r, has length 7r, and Y £ Q(7T,k — 1). Certainly B ^ Y (B 
stands for "blocker", the purpose of which is to ensure that Y is repeated only k times). There are 
potentially 2 n- ( fc+1 ) 7r choices for X, 2 n — 1 choices for B, and q{ir,k — 1) choices for Y. We claim 
that the assumption k > Lv^J guarantees that all choices result in a sequence with curling number 
k. For suppose on the contrary that S (in (18)) is also equal to UV k+1 , with u = \U\, v = \V\. 
Again there are two sub-cases. If (k + l)v > kn, then we have 



(k + l) 2 > n > (k + l)v > kit. 
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sok + l>v,k>v. The two different representations of S have a common suffix of length kir, and 
our inequalities imply (16). On the other hand, suppose (k + l)v < kir. Again we have kv > ir, 
and the common suffix satisfies (17). In both cases Theorem 3 now leads to a contradiction. This 
complete the proof of the theorem. □ 

The formulas in Theorem 19 cover a large portion of the c(n, k) table. However, although with 
more work they could be extended so as to apply to slightly smaller values of k, it seems unlikely 
that this approach will lead to a formula for c(n, k, n) for small values of k. 

3.6 The difference table d(n,k). 



n\k 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


2 


2 


-2 






















3 





2 


-2 




















4 


2 


-2 


2 


-2 


















5 











2 


-2 
















6 


4 


-2 


-2 





2 


-2 














7 

















2 


-2 












8 


6 


-6 


2 


-2 








2 


-2 










9 





6 


-6 














2 


-2 








10 


10 


-10 





2 


-2 











2 


-2 






11 





























2 


-2 




12 


20 


-10 


-4 


-6 


2 


-2 














2 


-2 



Table 9: The difference table d(n,k) defined by (19) (for an extended table see A217943). 

In the c(n,k) table (Table 4), if we look at the difference between each row and twice the 
previous row, we obtain a much simpler table. 5 We define 

d(n,k) := 2c(n-l,fc) - c(n,k), (19) 

for n > 2, 1 < k < n — 1, with d(n,n) = —2. The initial values are shown in Table 9. We see 
that if one ignores the initial entries in each row, most of the remaining entries are zero, except for 
diagonal lines of pairs of nonzero entries. More precisely, it appears that 



d(2k,k- 


1) = 


-d{2k,k) 


= 2, 


k > 2, 


d(3k, k — 


1) = 


-d(3k,k) 


= 6, 


k > 5, 


d(4k, k - 


1) = 


-d(4k,k) 


= 12, 


k > 6 


d(5k, k — 


1) = 


-d(5k,k) 


= 30, 


k>7 



(20) 

and so on. Only the first of these diagonal lines can be seen in Table 9, but they are all visible in 
the extended table that is given in entry A217943 in [8]. These expressions all follow from Theorem 
19: 

5 It was by studying the d(n, k) table that we were led to Theorems 6 and 19. 
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Theorem 20. In the range k > Y\/n\ , the only nonzero entries in the d(n, k) table are 

d(mk,k — l) = — d(mk,k) = q(m,m), for m > 1, k>m + 2. (21) 

Proof. This follows easily from Theorem 19. We prove the second assertion in (21) as an illustration. 
We have 

d{mk,k) = 2c{mk-l,k) - c(mk,k). (22) 



From (15), 



m— 1 



c(mk,k) = c(mk,k,ir) + c(mk,k,m), (23) 



7T = 1 

m— 1 



c(mk-l,k) = ^ c(mk — 1, fc,7r) ■ (24) 



7T=1 



Each summand in (23) (see (15)) is exactly twice the corresponding term in (24), and c(mk,k,m) 
= q(m, k) = q(m, m), so d(mk, k) = —q(m, m). □ 

Note that whereas the expression for c(n, k) in Theorem 19 involves the general function q(ir, k), 
the expression for d(n,k) in the range k > [y/n\ is fully explicit, since q(m,m) is given by (4). 
Theorem 6 gives another formula for d(n, k): 

d(n,k) = [ k + l\n}(p'(^,k^+q(^,k-l)^ - [k\n] (j/ (p k - l) + q (pfc - 2^ 

(25) 

and in particular, 

d(n,l) = [2|n]p'(n/2,l), 

d{n,2) = [3]n] (p'(n/3,2) +p(n/3,l)) - [2|n] p'(n/2, 1) . (26) 

The first of these is nicely checked by noticing that the nonzero entries in the first column of the 
d(n, k) table, namely 2, 2, 4, 6, 10, 20, • • • are also the entries in the first column of the p'(n, k) table 
(Table 7). It is also worth mentioning that if p is prime then c(p, k) = 2c(p — 1,/c) for all k (see 
(8)) and so d(p, k) = 0. 



3.7 Computation of c(n, k). 

We constructed an extensive table of values of c(n,k), hoping that it would lead to additional 
formulas for these numbers. First, by direct enumeration, using a number of different programs 
and different computers (including a cluster of 256 machines), we calculated c(n,k) for n < 51. 

Second, we tabulated e(n,i,j) for n < 23. This was sufficient for the recurrences (8) and 
(10)-(14) to give c(n, 1) for n < 200. These values suggest the conjecture that 

c( n 1 ) 

lim v - ' = 0.27004339525895354325 • • • . (27) 

n->oo 2 n 

From Equation (8) we have 

c(n, 1) > 2c(n-l,l) - [2\n] c(n/2, 1) , 
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which implies, using the known values of c(n, 1), that 

c(n,l) > 0.27 -2 n for n> 200. (28) 

We omit the proof. But we have no comparable upper bound for c(n, 1) (other than 2 n ), nor a 
proof that the limit (27) exists. 

Third, we used a different approach, which enabled us to take a table of the curling numbers 
of all sequences of length n < uq, and from this produce a table of c(n, k) for all n < 2uq, without 
having to compute the curling numbers of all 2 2n ° sequences of length 2riQ. The idea underlying 
this approach is the following. Consider a sequence S of length n with uq < n < 2riQ, and let M be 
its length-no suffix. As a first approximation, we set cn(S) = cn(M) = I (say). This approximation 
will be wrong if for some suffix T of M it should happen that T l+l is a suffix of S. If so, we must 
increase cn(S") by 1 for all S having suffix T l+1 . There are complications if there is more than one 
such T to be considered, but the Fine-Wilf theorem (Theorem 3) shows that this can only happen 
when 1 = 1. We omit discussion of the details. Using this approach (with no = 32) we were able to 
extend the table of values of c(n, k) to n = 64. 

Finally, we tabulated p(n,k) and p'(n,k) for n < 36. This, together with the 200 terms of 
c(n, 1), was sufficient for the recurrence in Theorem 6 to give the first 104 rows of the c(n, k) table. 
These results can be seen in A216955 and A122536. 

4 Tail lengths of {2,3}-sequences 
4.1 Distribution of tail lengths. 

Let i(n, i) denote the number of starting sequences Sq of n 2's and 3's which have tail length i, 
where i ranges from to Q(n). The initial values are shown in Table 10. Since the rows rapidly 
increase in length (cf. Table 1), we end this table at n = 9. Note that the entries for i = 9 through 
55 (which are all zero) have been compressed into a single column. Rows n = 22 and 32 are shown 
in Tables 11 and 12. Entry A217209 in [8] gives the first 48 rows in full. The first column is the 
same as the first column of the c(n, k) table, and contains the numbers c(n, 1) that are the subject 
of 83.4. 
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Table 10: Table of t(n, i), the number of sequences of n 2's and 3's with tail length i, for < i < Q(n) 
(A217209). 

As can be seen from Tables 10-12, the values in each row are distributed into clumps, with 
each clump gradually thickening as n increases. Table 11 shows the distribution of tail lengths at 
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Table 11: Distribution of tail lengths t(22,i), < i < 120, for all starting sequences of length 22 
(22 is the first time a tail of length 120 is reached). Note the three "clumps." 
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Table 12: Distribution of tail lengths £(32, z), < i < 120, for all starting sequences of length 32. 
The clumps have thickened. 



length 22, the first time that a tail of length 120 is reached (note the final "1", indicating that the 
starting sequence was unique). By length 32 (Table 12), the clumps have thickened but still end 
at 120. A tail of length greater than 120 does not appear until length 48, when the greatest tail 
length jumps to 131. The powers of 2 in Tables 11 and 12 suggest that the clumps tend to grow 
by prefixing good starting sequences of shorter length by random strings of 2's and 3's. However, 
we do not have a satisfactory model which explains this distribution. 
The mean value of the nth row, 

n(n) 

— ^it(n,i), 

i=0 

at least for n < 48, is converging to to a value around 2.741 • • • (see A216813). That is, if a starting 
sequence of n 2's and 3's is chosen at random, it will reach a 1 on average after only 2.741 • • • steps. 
This is in sharp contrast to the behavior of the best starting sequences, as we see from Table 1. 
Of course if the Curling Number Conjecture is false for sequences of 2's and 3's, the mean will be 
infinite beyond some point. 
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4.2 A probabilistic model. 

Let 9^ = c(n, k)/2 n denote the probability that a randomly chosen sequence of n 2's and 3's has 
curling number k. The available data (n < 200 for k = 1, n < 104 for k > 1) suggests that as n 
increases these probabilities are converging to the values 

0i m .270 , 2 ~ -434, 3 » .162, ^0 fc ps .134. 

A:>4 

When we extend a sequence 5 by appending the curling number = cn(S), if it were the case that 
the concatenation Sk were independent of S, we could model this process as a two-state Markov 
chain with states "curling number is 2 or 3" and "curling number is 1 or > 4." The probability of 
staying in the "2 or 3" state would be 02 + 03 ~ .596 • • • and the probability of leaving that state 
would be .404 • • • . If the starting sequence is randomly chosen from all 2 n possibilities, this model 
would imply that the maximal number of steps before reaching the "1 or 4" state for the first time 
would be about 

log 2 

t ~ n - — ; — ; r ~ 1.34n. 

log(l/.596) 

This Markov model certainly does not apply at the beginning of the appending process, but it 
could conceivably be valid once the sequence has been extended for a while, so we think it is worth 
mentioning. 

4.3 "Rotten" sequences: prefix decreases tail. 

Let So be an arbitrary sequence of 2's and 3's of length n, with tail length t(Sq) = i, say. It 
seems plausible that if n is large, then prefixing So by a single 2 or 3 will not change t(Sq), i.e., 
that r(2 5o) = t(3$o) = f(So). But could doing this actually decrease the tail length? Choosing 
an adjective not normally used in mathematics, we will call So rotten if either r(2So) < t(Sq) or 
r(3So) < t(So), an d doubly rotten if both r(2So) < t(So) an d r(3So) < t(Sq) hold. There are 
surprisingly few rotten sequences of length up through 30. The first few examples are shown in 
Table 13, and the numbers of rotten sequences of lengths 1 < hq < 30 are given in Table 14. If 
S = 3 2 3 2 3, for example, then sj e) = 3 2 3 2 3 2 3 3 2, and r(S ) = 4. But if we prefix S with a 2, 
so the starting sequence is 2 So = 2 3 2 3 2 3, the extension is 2 3 2 3 2 3 3 2, so r(2 So) = 2, and So is 
rotten. 

22 333 32323 323232 2323232 3232323 22322232 

23222322 23223223 33233233 223222322 223222323 232223222 332332332 

2232223222 2232223223 2232223232 2322232223 2322322322 2332332332 3322332233 
3323323323 22322232223 22322232232 22322232322 22322322232 22322322322 22323222322 



Table 13: The first 28 rotten sequences (A216730). 



1 1 1 1 2 4 4 8 
14 11 18 30 26 24 40 35 58 69 
48 84 158 67 139 287 215 242 490 323 

Table 14: Number of rotten sequences of lengths 1 through 30 (A216950). 
However, up to length 30 there are no doubly rotten sequences. 
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Conjecture 3. Doubly rotten sequences do not exist. 

If this conjecture were true, it would imply that one can always prefix a starting sequence So 
by one of {2, 3} without decreasing the tail length. This would explain the observation made in 
§2.2 about the behavior of f2(n) between jump points. It would also imply that Q(n + 1) > fi(n) 
for all n, something that we do not know at present. 

4.4 Sequences in which first term is essential. 

A statistic that is relevant to the study of rotten sequences is the following. If a starting sequence 
So of length n is chosen at random, and has curling number k, this means we can write So = XY k 
for suitable sequences X, Y. What is the probability that we must necessarily take X to be the 
empty sequence, i.e., that the only such representation goes all the way back to the beginning of 
So (and so the first term is essential for the computation of the curling number)? 2 23 2 23 is an 
example of such a sequence, since here k = 2, and X = e, Y = 223 is the only representation. But 
2 3 3 2 3 3 is not, since k = 2 and we can either take X = e, Y = 233 orX = 233 2, Y = 3, and the 
latter representation avoids using X = e. The numbers of such sequences of lengths 1 < n < 31 
are given in Table 15. If n is prime, the number is 2, but the limit supremum of these numbers 
appears to grow exponentially. 

222 42 82 10 8 14 
2 40 2 40 32 88 2 192 2 324 
100 564 2 1356 32 2226 370 4564 2 9656 

Table 15: Number of sequences of lengths 1 through 30 whose curling number representation XY k 
requires X = e (A216951). 



4.5 Sequences where prefix increases tail. 

In contrast to "rotten" sequences, we also investigated starting sequences So for which either 
r(2So) > t(So) or r(3So) > r(So). So = 22322 is an example, since r(So) = 2, r(2So) = 8, 
t(3 So) = 2. The numbers of such sequences of lengths 1 through 30 are shown in Table 16. There 
are rather more of these than there are rotten sequences, although we found no example where 
both r(2S ) > r(S ) and r(3S ) > r(S ) hold. 

2 1 2 1 5 3 12 9 19 16 
38 20 59 42 104 65 213 111 400 245 
765 439 1563 820 3046 1731 5955 3292 12078 6343 

Table 16: Number of sequences So of lengths 1 through 30 such that r(2So) > t(So) or r(3So) > 
r(S ) (A2 17437). 
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5 Gijswijt's sequence 



If we simply start with Sq = 1, and generate an infinite sequence by continually appending the 
curling number of the current sequence, as in (1), we obtain 

G:=112112223112112223211211222311211--. 

This is Gijswijt's sequence, A090822, invented by Dion Gijswijt in 2004, and analyzed by van der 
Bult et al. [2]. 

The first time a 4 appears in G is at term 220. One can calculate quite a few million terms 
without finding a 5 (as the authors of [2] discovered), but in [2] it was shown that a 5 eventually 
appears for the first time at about term 
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10 

Reference [2] also shows that G is in fact unbounded, and conjectures that the first time that a 
number m > 6 appears is at about term number 



m— 1 

a tower of height m — 1. The fairly complicated arguments used in [2] could be considerably 
simplified and extended if the Curling Number Conjecture were known to be true. 

Our final theorem shows that if the Curling Number Conjecture is true, any starting sequence 

5 that does not contain a 1 must eventually merge with G. 

Theorem 21. Assume the Curling Number Conjecture is true. Let S be an initial sequence not 
containing a 1, let be its "extension" (defined in §i), and let be its infinite continuation. 
Then S(°°) = S&G. 

Proof. By definition, S^ e ' does not contain a 1 but is immediately followed by a 1. Suppose 

5(00) ^ S (e)Q^ and 

suppose they first differ at a position where S^°°^ is n, say, whereas S^G is 
m < n. This n must be the curling number of some portion of S^°°' that begins with a suffix 
X, say, of Let = WX. Then = W(XT) n n--- for some prefix T of G, whereas 

G = T(XT) n ~ 1 m ■ ■ ■ . If n = 2, m = 1, we have G = TXT1 ■ ■ ■ . The curling number of the 
first copy of T is the first term of X, which is not 1, but the curling number of the second T 
is 1, a contradiction. On the other hand, if n > 3, G = TXTXT ■ ■ ■ XTm ■ ■ ■ , and the initial 
TXTX has curling number at least 2 and cannot be followed by T (which begins with 1), again a 
contradiction. □ 

We do not know if the theorem is still true if S is allowed to contain a 1 but does not end with 

1. 

6 Open questions and topics for future research. 

1. Is the Curling Number Conjecture (even just for the case of sequences of 2's and 3's) true? 

2. It would be nice to have some further exact values of fi(n), beyond n = 48, even though they 
will require extensive computations. 

3. What is the asymptotic behavior of O(n)? 
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4. Can the especially good starting sequences shown in Tables 2 and 3 (in particular those of 
lengths 22, 48 and 77) be generalized? What makes them so special? 

5. Can the properties of good starting sequences mentioned in Conjecture 2 be justified? 

6. Can Shirshov's theorem (see §2.5) be modified so as to apply to our problem? 

7. Are there analogs of Theorems 6 and 19 for p(n, k) (the number of primitive sequences) or 
p'(n, k) (the number of primitive and robust sequences) ? 

8. Are there formulas for c(n, k) that are more explicit than those given in Theorems 6 and 19? 
Is there a formula that matches the 200 known terms of the c(n, 1) sequence? 

9. Are there formulas or recurrences for the numbers t(n, i) of starting sequences with tail length 

il 

10. Is there a probabilistic model that better explains the distribution of values of t(n, i) visible 
in Tables 10-12 and A217209? The model presented in §4.2 is certainly inadequate. 

11. Do "doubly rotten" sequence exist? (See Conjecture 3.) 

12. The question implicit in the last sentence of §5. 

7 Appendix: Proofs of Theorems 12, 13, 14. 

7.1 Theorem 12. 

Proof. The first statement follows immediately from Theorem 7, taking S and T in that theorem 
to be YX and X respectively. To prove the second statement, let x := |X|, y := \Y\, s := \S\, 
t := \T\, and note that YX = STS implies x + y = 2s + 1. Also X = BY (say), with B ^ e. 

We are to show that s = y or s > 2y. First, suppose that y < s < 2y. Since s > y, there exists a 
sequence U with \U\ = s — y such that S = YU. Then we have the following chains of implications 
[the successive assertions are enclosed in square brackets]: [s > y] =4- [s > y — t] => [x = 2s + t — y > 
s] => [X y S y U], and [s < 2y] =► [s - y < y] => [\Y\ < \U\] => [Y t U (since YX = YBY = 
STYU)] => [F = CU (say)] [X y S = CUU] => [cn(A) > 1], a contradiction. 

Second, suppose that s < y. Then there exists U ^ e with \U\ = y — s such that Y = SU. But 
x > y > s and YX = STS imply X y S, and since X y Y then X y U also. If S y U then 
YX = YBSUX y UU, which contradicts cn(YX) = 1. Hence [U y S] => [s < y - s] [2s < y] => 
[x + y = 2s + t<y + t<y + x], since X y T. Since this is impossible, s < y is also impossible. □ 

Note that the condition \S\ = \Y\ is equivalent to 2|Y| > \X\: if s = y then x = y + t, 
which implies 2y = s + y>t + y (since t > s). Conversely, if s ^ y then s > 2y, which implies 
x + y = 2s + t > 4y + t, x > Sy + t, so x > 2y. Similar reasoning shows that the condition |5| > 2\Y\ 
is equivalent to 3|Y| < \X\]. 

7.2 Theorem 13. 

Proof. If X G A(n, i)\B(n, i) then we may apply Theorem 12 to X, taking Y = X^\ with \X\ = n, 
\Y\ = i, where re/2 < i < n. So there exist S, T with YX = STS, Y y T, S y T, and either 
\S\ = \Y\ or \S\ > 2\Y\. We cannot have \S\ > 2\Y\, since that implies \S\ > n, 2\S\ > 2n > \YX\, 
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which contradicts YX = STS. So \S\ = \Y\, Y = S, X = TY, and \T\ = n - i. Also cn(YA) = 1 
by definition of A(n,i), i.e., cn(YTY) = 1, so Y G B(i,n — i). 

The map from X to Y is one-to-one, since X determines Y. To show it is onto, take Y G 
B(i,n- i), let Q = Y^, and define P by Y = PQ and set X := QY = QPQ. Then we have 
cn(YQY) = cn(YA) = 1, so X G A(n,i). Also 171 = QPQ PQ QPQ has curling number at 
least 2, so X <£ B(n,i). Hence Iei(n,t)\B(n,t). □ 

7.3 Theorem 14. 

Proof. Corollary 9 ensures that the £ sets in the sum are disjoint, so we just need to establish- 
establish a bijection between the elements of A(n, i) \ B(n, i) and the disjoint union of the £ sets 
defined by the range of m. 

As in the previous proof, if X G A(n, i) \ B(n,i), then we may apply Theorem 12 to X, taking 

Y = JfH, with \X\ = n, \Y\ = i, where now 1 < i < n/3. There exist S, T with YX = STS, 

Y y T, S y T, and either \S\ = \Y\ or \S\ > 2\Y\. Let \S\ = m, \T\ = n + i - 2m. As before, 
S G B(m, \T\). There are three conditions that m must satisfy: (i) \T\ > 1 implies m < (n+i — 1)/2; 
(ii) \S\ > \T\ implies m > \{n + i)/3\; (iii) m = i < n/3 is incompatible with YX = STS, so 
m > 2i. 

Since m > i, we may write S = YU, with \U\ = m — i. Since m > 2i, m — i > i and \U\ > \Y\. 
Now X y S, so X y U and therefore f7 y Y. Since 5 = YU G B(m, |T|), C/ G £{m - \T\). The 
mapping X i— )■ U is one-to-one since A determines Y = XW, S 1 and T are unique by Theorem 8, 
m = \S\, and S = YU determines U. 

To show the map is onto, suppose U G £ (m — i,i,n + i — 2m) for some m satisfying conditions 
(i)-(iii) above. Then set Y = U®, S = YU, T = S^ n+i ~ 2rn \ and X = UTS. Then YX = STS 
so that U G £(m - i,i,n + t - 2m) implies YA G .A(n,i). But AY A = ASTS 1 >- TSTS, so 
cn(AYA) > 1 and therefore AYA G" B(n, i). □ 
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